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Abstract. We introduce a new combinatorial structure: superselectors. 
We show that superselectors subsume several important combinatorial 
structures used in the past few years to solve problems in group testing, 
compressed sensing, multi-channel conflict resolution and data security. 
We prove close upper and lower bounds on the size of superselectors and 
we provide efficient algorithms for their constructions. Albeit our bounds 
are very general, when they are instantiated on the combinatorial struc- 
tures that are particular cases of superselectors (e.g., (p, k, n)-selectors 
[15], (d, £)-list-disjunct matrices [25], MUT k (r)-families [28], FUT(k,a)- 
families [2], etc.) they match the best known bounds in terms of size 
of the structures (the relevant parameter in the applications). For ap- 
propriate values of parameters, our results also provide the first efficient 
deterministic algorithms for the construction of such structures. 



1 Introduction 

It is often the case where understanding and solving a problem means discov- 
ering the combinatorics at the heart of the problem. Equally time and again 
it happens that the crucial step towards the economical solution of problems 
arising in different areas hinges on the efficient construction of a same combi- 
natorial object. An interesting example is that of superimposed codes [26] (also 
known as cover-free familes [20], strongly selective families [10], disjunct matrices 
[16], ...). Superimposed codes represent the main tool for the efficient solution of 
several problems arising in compressed sensing [11], cryptography and data secu- 
rity [27], computational biology [3], multi-access communication [36], database 
theory [26], pattern matching [24,34,32], distributed colouring [29], and circuit 
complexity [4], among the others. Due to their importance, a lot of efforts has 
been devoted to the design of fast algorithms for the construction of superim- 
posed codes of short length. In this line of research a main result is the paper 
by Porat and Rotschild [33] who presented a very efficient polynomial time al- 
gorithm for that purpose. More recently, Indyk et al. [25] showed that optimal 
nonadaptive group testing procedure (i.e, superimposed codes) can be efficiently 
constructed and decoded. 

In the past few years it has also become apparent that combinatorial struc- 
tures strictly related to superimposed codes lie at the heart of an even more vast 



series of problems. As quick examples, the selectors introduced in [9] were instru- 
mental to obtain fast broadcasting algorithms in radio networks, the (p, k, n)- 
selectors of [15] were the basic tool for the first two-stage group testing algorithm 
with an information theoretic optimal number of tests, the (d, £)- disjunct matri- 
ces of [25] were a crucial building block for the efficiently decodable non-adaptive 
group testing procedures mentioned above. 

It is the purpose of this paper to introduce superselectors, a new combinato- 
rial object that encompasses and unifies all of the combinatorial structures men- 
tioned above (and more). We provide efficient methods for their constructions 
and apply their properties to the solutions of old and new problems for which 
constructive solutions have not been shown so far. In particular, superselectors 
extend at the same time superimposed codes and several different generalizations 
of theirs proposed in the literature. 

When appropriately instantiated, our superselectors asymptotically match 
the best known constructions of (p, k, n)-selectors [15], (d, £)-list-disjunct matri- 
ces [25], monotone encodings and (k, a)-FUT families [31,2], MUT^ (r)-families 
for multiaccess channel [28, 1]. In some cases, e.g., for {p, k, n)-selectors and (d, €)- 
list-disjunct matrices, we also improve on the multiplicative constant in the O 
notation. We show that optimal size superselectors (and hence all the above 
structures) can be easily constructed in time polynomial in n, the main dimen- 
sion of the structure, though exponential in the second parameter p. This might 
be satisfying in those applications, e.g., computational biology, where p <C n. A 
major open question is whether it is possible to deterministically obtain optimal 
size superselectors (or even selectors) in time subexponential in p. However, in 
cases when p is constant we note that our results provide the first known poly- 
nomial construction of optimal size {p, fc, n)-selectors (and related structures). 

It should be also noticed that selectors, and similar combinatorial structures, 
generally have to be computed only once, since they can be successively used in 
different contexts without the need to recompute them from scratch. Therefore, 
it seems to make sense (and in absence of better alternatives) to have onerous 
algorithms that output structures of optimal size, (the crucial parameter that 
will affect the complexity of algorithms that uses selectors and the like structures 
in different scenarios) than more efficient construction algorithms that produce 
structures of suboptimal size. This brings us to another question. Most of the 
structures mentioned above, and subsumed by our superselectors, can also be 
obtained via expander graphs, or equivalently, randomness extractors. However, 
to the best of our knowledge, the best known explicit expander-based construc- 
tions give only suboptimal (w.r.t. to the size) selector-like structures. Table 1 
summarizes how our results compare to the state of the art. The bounds are re- 
ported as they were given in the original papers, thus producing a slight level of 
difformity. However, if with this choice we might be requiring the reader to put 
a little bit of effort in the comparisons, we are not risking mistranslations of the 
bounds from one notation into another. The main aim of the data in the table is 
to show that the generalization provided by the superselectors in no case implies 
a loss in terms of optimality of the structure size. In addition, the number of 



applications of superselectors we shall present in Section 3 seems to suggest that 
they represent a basic structure, likely to be useful in many contexts. 
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Table 1. Bounds attained via SUPER-SELECTORS against best known bounds. 



2 The (p, V, n)-SUPER-SELECTOR 

Given two vectors x, y e {0, 1}", we denote with x© y the Boolean sum of x 
and y, i.e., their componentwise OR. Given anmxn binary matrix M and an 
n-bit vector x, we denote by M © x the m-bit vector obtained by performing the 
Boolean sum of the columns of M corresponding to the positions of the l's in 
x. That is, if x has a 1 in positions, say 3, 7, 11, . . . , then M x is obtained by 
performing the © of the 3rd, 7th, 11th, . . . , column of M. Given a set 5 C [n], 
we use M(S) to denote the submatrix induced by the columns with index in S. 
Also we use a$ to indicate the Boolean sum of the columns of M(S). Given two 
n-bit vector x, y we say that x is covered by y if Xi < yi, for each i — 1, . . . , n. 
Note that if x is not covered by y then it means that x has a 1 in a position in 
which y has a 0. 

We first recall the definition of (p, k, n)-selector, as given in [15]. A (p, k, n)- 
selector is an m x n binary matrix such that for any subset S of p < n columns, 
the submatrix M(S) induced by S contains at least k < p rows of the identity 
matrix I p . The parameter m is the size of the selector. 

Definition 1. Fix integers n,p, withp < n and an integer vector, v = (v\, . . . , v p ), 
such that Vi < i, for each i — l,...,p. We say that an m x n binary matrix M 



is a (p, v, n) -SUPER-SELECTOR if M is a (i, Vi,n) -selector for each i = 1, . . . ,p. 
We call m the size of the SUPER-SELECTOR. 

Our main result on SUPER-SELECTORS is summarized in the following theo- 
rem, whose proof will be given in Sections 4. 

Theorem 1. A (p, v, n)-SUPER-SELECTOR of size 



can be constructed in time polynomial in n and exponential in k. 

The "identification" capability of a super-selector are as follows. 

Lemma 1. Let M be a (p, v, n) -super-selector, v = (vi,...,v p ). Let S be 
any set of x < v p columns of M. Let ag denote the Boolean sum of the columns 
in S. Then, from ag it is possible to identify at least v x+y of the columns in S, 
where y is the number of columns of M which are not in S but are covered by 
a s . Moreover, y < min{j | x < Vj} — x. 

Proof. Let T = {b | b ^ S and b ag — as}, i-e., T is the set of columns not 
in S but covered by as- Then, y — |T|. We first prove the last statement. 
Claim, y < min{j | Vj > x} — x. Let j* be a value of j achieving the minimum. 
The claim is a consequence of M being a (J* ,Vj*, n)-selector. To see this, assume, 
by contradiction, that \T\ > j* - x. Let V C T and \T U S\ = j*. Then, there 
are at least Vj* > \S\ columns in T' U S with a 1 in a row where all the other 
columns have a 0. Thus, there is at least one column of T' which has a 1 where 
all the column of S have a 0. This contradicts the fact that all the columns of T 
(and hence of T 1 ) are covered by ag. 

Since x+y < j* < p, and M is an (x+y, v x+y ,n)-se\ector, among the columns 
of S U T there are at least v x+y which have a 1 where all the others have a 0. 
Let W be such set of columns. By an argument analogous to the one used in the 
claim we have that W C S and we can identify them. □ 

Remark 1. Notice that if Vi > t>j_i, for each i = 2, . . . ,p, then we have a situation 
that, at a first look, might appear surprising: the larger is the number of spurious 
elements, i.e., columns not in S but covered by as, the more information we get 
on S, i.e., the more are the columns of S that are identified. 

Remark 2. The same argument used in the proof above shows that Lemma 1 
also holds when as is the component-wise arithmetic sum of the columns in S. 

3 Applications of the super-selectors 

Approximate Group Testing. In classical non-adaptive group testing [16], we 
want to identify a subset PC [n], with \P\ <p, by using the minimum possible 
set of tests T\, . . . ,T m , where for each i = l,...,m, we have Tj C [n]. The 




where kj = min 




outcome of test Tj is a bit which is 1 iff Ti n P 7^ 0. If we require that the whole 

2 

P is identified exactly, and non-adaptively, then it is known that log ^) 

tests are necessary [16]. 

Cheraghchi [8], in the context of error- resilient group testing, Gilbert et al. 
[22], in the context of sparse signal recovery, and Alon and Hod [2] considered 
the case when one is interested in identifying some approximate version of P. It 
turns out [8] that at least p log ^—p — e — O(ei log "~P~ e ° ) tests are necessary if 
one allows the identification algorithm to report a set P', such that |P' \ P| < e 
and |P\P'| < e\. In other words, the algorithm can report up to eo false positives 
and up to e.\ false negatives. 

Let M be an appropriate (p + e , v, n)-SUPER-SELECTOR, with the compo- 
nents of vector v defined by Vi = i — min{eo, ei} + 1. We can use M to attain 
approximate identification in the above sense. Proceeding in a standard way, 
map [n] to the indices of the columns of the super-selector and interpret the 
rows of the super-selector as the indicator vectors of the tests. Now the vector 
of the outcomes of the tests is the Boolean sum & P of those columns whose 
index is in P. Let P' be the set of the indices of the columns covered by ap. 
We have P C P' and by Lemma 1 also |P'| < |P| + eo- Moreover, from Lemma 
1 we also know that a set of positives P" C P can be exactly identified, with 
|P"| > |P| - ei. Therefore, any set P* with P" C P* C P' satisfies the bounds 
on the false positives and false negatives. 

Note that, for the interesting case of e ,ei = 0(p), the above group test- 
ing strategy is best possible since it uses O(plog^) tests which matches the 
lower bound of [8]. Cheraghchi [8] considers the case when some tests migh be 
erroneous and only focuses on the case of zero false negatives. Alon and Hod 
[2] consider the case of zero false positives and obtain 0(p\og(n/p)) tests pro- 
cedures, which are in fact optimal for this case. Gilbert et al. [22] allow both 
false positives and false negatives but their procedures uses 0(p\og 2 n) tests. 
Moreover, our implementation guarantees the exact identification of at least 
p' — min{eo,ei} + 1 positives, where p' < p is the actual number of positive 
elements. 

Additive Group Testing. We now consider exact group testing with additive 
tests. In this variant, the outcome of testing a subset Tj is the number of positives 
contained in Tj, i.e., the integer |T, n P|. 

It is known that 0(t^— log -) tests are necessary if we want to exactly iden- 

v log p ° p ' J J 

tify P using additive tests (see, e.g., [23] and references therein). 

Proceeding analogously to the case of Approximate Group Testing, we can 
reformulate the additive group testing problem as follows: given positive integers 
n and p < n, minimize the number m of rows of an m x n 0-1 matrix M such 
that any set P of up to p columns of M can be identified from their sum'ap. 

Let M be an appropriate (2p, v, n)-SUPER-SELECTOR, with the components 
of vector v defined by Vi = i, for i = 1, . . . , ^Jp and Vi = |~|] + 1, for ^Jp < i <2p. 
We show that M provides a non-adaptive strategy for additive group testing with 
0(p\og(n/p)) tests. 



1 Here sum is meant in the arithmetic way, i.e., z = x + y iff z; = Xi + yi, for each i. 



If |P| < y/p, using the fact that U|p|+i = \P\ + 1, Lemma 1 and Remark 2 
imply that from as we can identify the whole set P. 

If, otherwise, \P\ > y/p, by using the fact that t>2|p > |P|, by Lemma 1 and 
Remark 2, from ap we can uniquely identify a subset R of P, such that \R\ > p/2 
and confine the elements of Pi = P \ R into a set Si such that \Si\ < p. In 
particular S\ U R is the set of all columns of M which are component-wise not 
larger than ap. 

Now, let ap 1 = ap — J2ieR Ci ' wnere c i denotes the ith column of M and the 
additions and subtractions among vectors are meant component- wise. Clearly, 
ap 1 is the sum of Pi, i.e., the columns that are still to be identified. Note also 
that &p 1 can be computed from ap and the set R of identified columns without 
any additional test. 

We have now a smaller instance of the same problem from which we started, 
namely identifying the columns of Pi, among the ones in M(S\ \ R), from their 
sum ap r Also notice that Lemma 1 still applies to the columns of M(Si \ R). 
Therefore, repeatedly using the above argument we can eventually identify the 
whole set P. Again, no additional tests are required since we reinterpret, so to 
speak, the tests outcomes in light of new acquired knowledge. 

Finally, by Theorem 1 a super-selector M of size 0(plog -) can be con- 
structed in time 0{n p ), which gives the desired result. We hasten to remark 
that in [23] Grebinsky and Kucherov prove the existence of matrices M with an 
optimal 0(j^-^ log — ) number of rows for the Additive Group Testing described 
above. However, it's not clear whether their probabilistic construction can be 
derandomized, and at which cost. We thought worthwhile to mention that our 
combinatorial tool gives, for free, a solution to the Additive Group Testing prob- 
lem using number of tests that differ from the optimal one for only a factor of 
log p. 

Monotone Encodings. Moran et al. posed the problem of efficiently construct- 
ing (n, fc)-monotone encodings of size r, (denoted by ME(n, k, r)), i.e., monotone 
injective functions mapping subsets of [n], of size up to k, into 2^ [31]. Monotone 
encodings are relevant to the study of tamper-proof data structures and arise also 
in the design of broadcast schemes in certain communication networks A simple 
counting argument shows that ME(n, k,r) can only exist for r = Q(k\ogn/k). 
We can use our super-selector for obtaining ME(n,k,0(klogn/k)) in the 
following way. Let M'*' denote the (t, v, n)-SUPER-SELECTOR defined by the vec- 
tor v whose ith component is i>i = \i/2\ + 1 for each i = l,...,t. By Lemma 
1, we have that for any S C [t/2], from a^ we can identify at least \S\/2 of the 
columns in M^(S). Let S yes (resp. S no be the subset of these columns which 
we can (resp. cannot) identify from a^. 

We can obtain our mapping in the following way. Given So € (<].), we map 
it to the concatenation of the vectors aoai, &\ og k, where a^ is the Boolean 
sum of the columns of M^ k / T ^ (Si), with Si = S?° v 

The mapping is of size X^cf W I W ~ 0(k\ogn/k), therefore of opti- 
mal size. Moreover, by observing that for each S C T we have as < a T and 



S no C T n °, we also have that the mapping is monotone. By our Theorem 1 such 
mapping can be deterministically computed in 0(n fe )-time. 

Alon and Hod [2] defined {k, a)-FUT families in order to obtain ME(n, k, 0{k log ^)) 
in a way analogous to the one we depicted above, i.e, by chaining (Jr, 5)- 
FUT families 2 of cardinality n for t = 0, 1, . . . , log k. However, for optimal, i.e., 
0(k logn/fc)-size monotone encodings no explicit deterministic construction has 
been provided so far [2,31]. 

Selector-based data compression. Let M be a (p + 1, 2p, n)-selector of size 
m = 0(p\og(n/p)). Let x be a binary vector with ||x|| < p. Define the en- 
coding of x as the vector y equal to the componentwise OR of columns of 
M corresponding to the positions of the l's in x. Let x^, . . . , Xi d , d < p, be 
all the components of x such that x^ = ... = Xi d = 1. By Lemma 1, there 
exist at most t other columns m,j 1 , . . . , m,j t of matrix M, t < p, such that 
y = m,j 1 V ... V mj f V m il V ... V m id . 

Now, think of an "encoder" that works as follows: for a given vector x it first 
computes its encoding y, then it computes A — {ii, . . . , id}, B = {ji, . . . , j t }, 
and subsequently it computes an ordered list L from AuB. Finally, the encoder 
computes a binary vector z of length 2p such that zu = 1 if and only if the 
k-ih element of the ordered list L is an element of A. The encoding of x is now 
the concatenated binary vector yz of length 0(p log(n/p)) + 2p = 0(p \og(n/p)). 
One can see that x can be (efficiently) recovered from yz and that the length of 
the encoding yz of x is information theoretically optimal. 

An extension of the above reasoning can be carried out also to a scenario 
where x is generated by a probabilistic source, provided that Pr{ ||x||o > p} 
goes to zero as the length n of x grows. 

The above encoding procedure has some features which might be of some in- 
terest in the area of data compression. Specifically, it does not require construc- 
tion of code dictionary, nor it is based on statistical analysis of the sequences to 
be compressed. Moreover, the encoding/decoding procedure only involves sim- 
ple operations on Boolean vectors (OR's of them and checks for containments), 
which leads to fast implementation. Furthermore, the above procedure provides 
a faster alternative for optimal size enumerative encoding of low-weight binary 
sequences. [12,35]. In particular, for binary vectors of Hamming weight at most 
d, our encoding/decoding procedures require time 0(nd\og(n/d)), whereas the 
procedures given in [35] require time 0(n log 2 n log log n) for the encoding, and 
time C>(nlog 3 n log log n) for the decoding. 

Tracing many users (or finding many positives). In [28] the authors in- 
troduced /c-out-of-r Multi User Tracing families, aka MUT),(r). A family T of n 
many subsets of [m] is MUT k (r) if given the union of £ < p of the sets in J", one 
is able to identify at least k of them, or all if £ < k. Such definition is motivated 
by applications in multiple access channel communication and DNA computing 
(see [28] and references quoted therein). 



2 In fact, via SUPER-SELECTORS, we can provide constructions of optimal size (k, a)- 
FUT families, for any 1/2 < a < 1 - ±. 



In [1] it was proved that MUT k (r) families exist for m = 0((r + k 2 ) log ^), 
determining the maximum possible rate for all k < ^fr up to a constant 
factor. Somehow surprisingly, in all this range the rate is O(-), independently 
of k. However, no constructive proof of such "optimal" rate families has been 
provided so far. 

We can use our SUPER-SELECTORS to match such result: Let M be a (2r, v, n)- 
SUPER-SELECTOR where the vector v = (vi, . . . ,v 2r ) is defined by: Vi = i for 
i = 1, . . . , k; Vi = k, for i = k + 1, . . . , 2r — 1, and v 2r — r + 1. 

First, we notice that M is a (k, k, n)-selector, i.e., a (k— l)-superimposcd code, 
hence every union of up to k — 1 columns is unique. Moreover, for any k < £ < r, 
by Lemma 1 we have that at least k columns out of I can be identified by their 
Boolean sum. These two properties show that the sets whose indicator vectors 
coincide with the columns of M, form an MUTk(r) family. Therefore, Theorem 
1 applied to M provides the best known bound on the size of MUTk(r) families, 
i.e., the 0(max{r, k 2 } log n/r) of [1]. Our main theorem also explicitly shows 
that the result of [1] can be attained by a constructive 0(n k ) strategy. 

The (d, €)-list disjunct matrices. Indyk et al. [25] studied (d, ^)-list disjunct 
matrix which are m x n binary matrix such that the following holds: for any 
disjoint subsets S,T of columns, such that \S\ < d and |T| > £, there exists a 
row where there is a 1 among the columns in T, while all the columns in S have 
a 0. Such structure was also considered in [14, 15, 19,8]. 

One can easily verify that a (d+£, d+1, n)-selector is also a (d, £)-\ist disjunct 
matrix. As a consequence, our Lemma 3 (below) provides improved bounds on 
construction of (d, £)-list disjunct matrices 3 compared to the ones given in [25]. 

For any d > £, by using (d + £, d+ 1, n)-selector, we obtain (d, £)-list dis- 
junct matrices of size 0{ ^ d+ p logj) for any constant d and I. This improves 
on [25], particularly for d large compared to I. Also for I = 0(d) and partic- 
ularly for (d, cZ)-list disjunct matrices our bound compares favorably with the 
0((dlogn) 1+ °( 1 ') size bound given in [25] and the O(d 1+0 ^ logn)) size bound 
given in [8]. Alternatively, for d < £ one can see that a (2d, d + l,n) selector 
is also a (d,^)-list disjunct matrix. Such a selector can be constructed of size 
0(d\ogn/d), in time n 2d+o{1 \ 

We remark that the above results on the size of (d, £)-list disjuct matrices via 
selectors, are tight with respect to the lower bounds provided in [15, Theorem 
2], as reported in Table 1. 

4 Bounds on the size of a (p, v, n)-suPER-SELECTOR 

In this section we prove the bound on the size of a (p, v, n)-SUPER-SELECTOR as 
announced in Theorem 1. First we present an immediate lower bound following 
from the ones of [7, 15] on the size of (p, k, n)-selectors. 



3 Analogous bounds, in terms of size, are derivable from [15] via (p, k, n)-selectors. 
However, their construction time is exponential in n. 



Theorem 2. The size of a (p, v, n) -SUPER-SELECTOR has to be 



f log(n/j) 

ii max 



j=i,..., P j - «j- + 1 log (j/(j - Wj + 1)) + 0(1) 

For the upper bound, we first give a proof based on the probabilistic method 
and then derandomize it. We need the following two lemmas. 

Lemma 2. There exists a (p, v, n) -SUPER-SELECTOR of size 

m, = O ( max - — ^E^i — loe(n/ j) 

Proof. Generate the m x n binary matrix M by choosing each entry randomly 
and independently, with Pr(M[i,j] = 0) = (p — l)/p = x. Fix an integer j < p. 
Fix S e ('"')■ For any subset R of j — v j + 1 rows of /j let E^s be the event 
that the submatrix M(S) does not contain any of the (j — Vj + 1) rows of i?. We 
have 

Pr{E R . s ) = (1 - (j - Vj + l)^" 1 !! - x)) m (1) 

Let J?i, . . . , R t , t = ( be all possible subsets of exactly j — v j + 1 rows 
of the matrix Ij, and let N$ be the event that, for some index i e {1, . . .,£}, 
the sub-matrix M(S') does not contain any of the rows of the subset Ri. By the 
union bound we have 

Pr(N s ) = Pr E*,^ < ( ? _ + i) ( X " ^' ~ v i + x ^ 

One can see that N$ coincides with the the event that the sub-matrix M(S) 
contains strictly less than Vj rows of Ij. To see this, it is enough to observe that 
if M(S) contains less than vj rows of Ij it means that there is some i such that 
M(S) does not contain any of the rows in R4. 

Let Y M denote the event that the matrix M is a (p, v, n)-SUPER-SELECTOR. 
We can use again the union bound to estimate the probability of the negated 
event Y M - If M is not a (p, v, n)-SUPER-SELECTOR then there exists an integer 
j E \p] such that for some S € ('"') the event N$ happens. Therefore, 



Pr(Y M ) = Pr 
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whence, we obtain: 



Pr(Y M ) >1-E(") (j _ 3 V . + x ) (1 " 0' - ^ + 1) ^(l -x)) m . (3) 



By the probabilistic method, there exists a (p, v, n)-SUPER-SELECTOR of size 
m* = argmin m>1 Pr(YM) > 0. The rest of the proof will consist in showing that 
m* satisfies the bound claimed. 

Let us focus on the value Cj such that the j-th summand in (3) satisfies the 
following inequality 

We shall use the following two inequalities 4 



_ I + 1 ) (1 - (J + 1)^(1 - < 1/p (4) 



(i-B-,,+i).'-' { i- I r i *'"<(^ *■ ( 5 ) 

By (5)-(6), we have that the left- hand- side of (4) can be upper bounded by 

77, J pt 2 2 e 2 j v ' = n ti 2 2 e 2 j pe , 

(7) 

Therefore, if we take Cj = ( 3 _ 3 ,f. e +1 ) we have that (7) can be further upper 
bounded with n~ 2j ' e 2j j 2 ^ which is not larger than 1/p for all n > 20 and n > 
P> j > 0. Therefore, by taking 

m= max c 5 Tog(n/j) = max - — JLJL — _ l g — (8) 

J=1,-,P j=l,-,P (j — Vj + 1) j 

we can have each of the summands in (3) smaller than hence guaranteeing 
Pr(Y M ) > 0. By definition to* < to which concludes the proof. □ 

The same analysis as above, tailored for a (p, k, n)-selector gives the following 
bound, whose proof is deferred to the appendix. 



Lemma 3. For each < k < p < n, there exists a (p, k, n)-selector of size 
m =(\o g2 —^)j p i og ^(i + (i))<_J|L_i og ^(i + (i)). (9 ) 

ep 2 

Moreover, there exists a (p,p, n)-selector of size to — \og{n/p) (1 + o(l)). 

I°g2 e 

We can now combine the last two lemmas to obtain the main result of this 
section, providing an almost tight upper bound on the size of a SUPER-SELECTOR. 



A step by step computation is in the appendix 



Theorem 3. There exists a (p, v, n)-SUPER-SELECTOR of size 



m = 0( max kj log(n/j)), where kj = min 



V=i.-.p v ' ' J [(j-Vj + 1) log 2 eJ 

Proof. Fix fc = max jj | ^jjp^q > j^— j| • Let Mi be a minimum size (k, fc, n)- 

selector. In particular this is a (k, < 1, 2, . . . , k >, n)-SUPER-SELECTOR hence a 
fortiori it is also a (fc, . . . , u fe ), n)-SUPER-SELECTOR. 

Let M 2 be a minimum size (p, (0, . . . , 0, Vk+i, ■ ■ ■ , v p ), n)-SUPER-SELECTOR. 

Let M be the binary matrix obtained by pasting together, one on top of the 
other, Mi and M 2 . It is not hard to see that M is a (p, v, n)-SUPER-SELECTOR. 
By Lemmas 3 and 2, M satisfies the desired bound. The proof is complete. □ 

Remark 3. Note that, if there exists a constant a such that Vj < aj for each 
y/p < j < P, then the size of the super-selector is 0(p\og ^), matching the 
information theoretic lower bound. Particular cases are given by instances where 
for each j, we have Vj = fj(j) for some function fj such that fj{j) = o(j). 

Deterministic construction. By using the method of the conditional expec- 
tations (see, e.g., [30]) we can derandomize the result of the previous section and 
provide a deterministic construction of the (p, v, n)-SUPER-SELECTOR of Theo- 
rem 3 which is polynomial in n but exponential in the second parameter p. More 
precisely we obtain the following result, whose proof is deferred to the appendix. 

Theorem 4. There exists a deterministic O (p 3 n p+1 log n) construction of the 
(p, v, n) -super-selector given by Theorem 3. 
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APPENDIX 



A The proof of Theorem 2 

Theorem 2. The size of a (p, v, n) -SUPER-SELECTOR has to be 
n ( max 1 l ° g{n/j) 



3 = 1,.. .,p j -Vj + l log - Vj + 1)) + 0(1) 

Proof. By definition, a (p, v, n)-SUPER-SELECTOR simultaneously a (vj,j,n)- 
selector, for each j — l,...,p. Therefore, obviously, the SUPER-SELECTORs size 
is at least as large as the size of the largest {vj,j, n)-selector it includes, over all 
j = 1, . . . ,p. The desired result now directly follows from [7, Theorem 2], which 

states that any (vj , j, n)-selector has size Q ( -. - £ + 1 log(j / { f^+i) ) +o ( l j 



B The calculations for inequalities (5) (6) 

As regards (5), we have 

(1 - (, - Vj + 1) ^(1 - x )f**M» = (l - (, - Vj + 1) 



1 (j vj + 1) / _ 1 

p V p 



j-1 \ c jl°g(™/i) 

c 3 lo s(n/j) 



p-l\ cjlog(n/j) 

< I 1 ~ V " J ' I 1 ~ ~ ' 



(j - Vj + 1) f i 1 



I 1 (j-ttt + Vx 



_ (3-fj+l)cj 



cj log(n/j) 



As regards (6), we only need to use the known inequalities < (^) b and 
(b) ^ (a/2)' so we have 



C The construction of the (p, v, n)-suPER-SELECTOR 



In this appendix we prove Theorem 4 presenting a derandomized construction 
of the (p, v, n)-SUPER-SELECTOR of Theorem 3. 

We shall need the following technical fact whose proof is immediate. 

Fact 1 Fix integers m,p > 1 and < k' < k < p. Let A be a subset of k distinct 
rows of the identity matrix I p . Let x = (p — l)/p and M be a randomly generated 
m x p binary matrix with each entry being independently chosen to be with 
probability x. Let f(m, k! , k) denote the probability that at least k' distinct rows 
of M are in A. Then, it holds that 



f(m,k',k) = < 



(1 -ak)f(m- l,k',k) + akf(m- l,k' - l,k- 1) ifm>k' > 
1 ifk' = 

ifm<k', 



where a — x p ~ l (l — p) is the probability of generating a particular row of A. 

By using the above expression, we can compute in 0{k 2 m) time the complete 
table of values f(a, b, c) for each a = 1, . . . , m, b = 1, . . . , k', c = 1, . . . , k. 

We limit ourselves to discuss the derandomization of the proof of Lemma 2. 
The same ideas can be used to derandomize the construction of the selectors 
provided by Lemma 3, which are needed to construct the SUPER-SELECTOR 
satisfying Theorem 3. 

For each j = l,...,p and each subset S of j columns of M let Xg^ be 
the indicator random variable which is equal to 1 if M(S) contains at least k 
rows of the identity matrix Lj. Let X = Y^=i12se( [n] ) ~^~S^ '• ^ follows that 

E[X] - E P j= i E Se (H) Pr&P = !)■ Since Pr(X ( ^ = 1) = Pr(N^), by (2), we 
have 

E m * f ( ■) i 1 - G 4. + 1) a - c? - «i + - -)) m ) do) 

From Section 4, in particular from equation (4), we know that the choice of m 
satisfying (8), guarantees 

v,4+i) (i-o--«i+iy- i d-*)r<^. 

This, together with (10) gives 

„\ / 1 \ p /„\ 

-1. (11) 

This quantity represents the expected total number of sub-matrices of j 
columns (summed up over all j = 1, . . . ,p) with at least Vj rows of the identity 
matrix Ij, assuming each entry being with probability (p — I) /p. 




We now choose the entries of M one at a time, trying to maximize the above 
expectation conditioned on the entries already chosen. We shall construct M 
row by row. M[r, c] will denote the entry in row r and column c. Once the entry 
M[r, c] has been fixed, we use \x rc to denote its value. 

For each r = 1, . . . , m, and c = 1, . . . , n, let X[r, c] be the expected value 
of X conditioned on the choices of the entries made before chosing entry (r, c) . 
Also, let l [r, c] and X x [r, c] be the same but also conditioned to M[r, c] = or 
M[r, c] = 1 respectively. Let -<i ex denote the lexicographic order among pairs, 
i.e., (x, y) <i ex (x',y') iff x,x' or x = x' and y < y' . 

We have 

1 | M[r',c'\ = Hr'c'Jor each (r',c') <i ex (r,c)j . 



1 | M[r', c'] = fi r 'c', for each (r', c') -<\ ex (r, c) and M[r, c] = 



1 | M[r',c'] = ^ r ' c ',for each (r',c') ^; ea; (r, c) and M[r, c] = lj . 

In accordance to the method of the conditional expectations we set M[r, c] — 
if and only if X [r, c] > Xi [r, c] . 

It is not hard to see that this leads to the construction of the desired selector. 
We have 

max{ X [r, c- 1] , X x [r, c- 1] } = X [r, c] = X Q [r, c] + - l x [r, c] < max{ X [r, c] , A\ [r, c] } = X [r, c+ 1] , 

V P 

where the first and the last equality follows by the definition of the strategy and 
the second equality by the definition of conditional expectation. 

This shows that the expectation X[r, c] is monotonically increasing. 5 By (11), 
we start with X[l, 1] > Y^j=i Cj) — ^- Moreover, once all the entries have been 
chosen, the above expectation is in fact the actual number of submatrices sat- 
isfying the super-selector conditions. This must be an integer and by the 
starting condition and the above monotonicity it is greater than Y^j=i (j) — 
which means that the matrix M we have so constructed is indeed a (p, k, n)- 
selector. 

We also have to show that we can compute X n [r, c] and X\ [r, c] "efficiently" . 
Let us fix j and focus on a single subset S of j columns and the corresponding 
sub-matrix M(S). It will be enough to show that we can compute efficiently the 
following quantity: 

X s [r,c] - Pr = 1 | M[r',c'] = jv^for each (r',c') < lex (r,c)) 

5 For the sake of the presentation, we are here tacitly assuming that 1 < c < n. It is 
not difficult to extend the argument also for the extreme cases when c £ {1, n}, i.e., 
the expectations involved are about consecutive rows of M.. 



*M = £ £ Pr(xf = 

, = l Se( H) 
, = l Se( H) 
, = l Se (H) 



which is the probability of having at least Vj rows of the identity matrix Ij in the 
sub-matrix M(S) given the choice made so far in M up to M[r, c— 1] . In fact, the 
computation of X [r, c] , X [r, c] , X\ [r, c] involves at most Y^ P j=i (") probabilities 
X s [r',d]. 

Suppose we are about to choose the value of M[r,c\. Let a = af_ x be the 
number of rows from Ij which already appear in the first r — 1 rows of M(S), 
given the entries fixed so far. 

1. If the c-th column of M coincides with the 1-st column of M(S) then, no 
entry has been chosen so far in the r-th row of M(S) and, recalling Fact 1, 
it should not be difficult to see that we have 

X s [r,c] = f(m-r + l,Vj -a,j -a). 

2. otherwise we have one of the following three cases 

(i) the r-th row of M(S) cannot be one of the rows of Ij which are not 
already in the first r — 1 rows of M(S), or there are already two entries 
with value 1. Therefore, 

X s [r,c] = f(m-r,Vj -a,j-a) 

(ii) among the c — 1 entries which have already been fixed in the r-th row 
of M(S), there exists exactly one entry which is equal to 1. Moreover, 
there is exactly one choice of the remaining entries on row r such that 
this row becomes one of the j — a rows of Ij which do not appear among 
the first r — 1 rows of M(S). In particular, if all the remaining entries 
of row r are chosen to be then this becomes one of the row of Ij not 
yet in M(S). Therefore the probability that M(S) ends up containing 
Vj rows of Ij becomes the probability that in the remaining m — r there 
are at least Vj — a — 1 rows from the j — a — 1 not appearing in the first 
r rows of M(S).Thus, 

X s [r,c] = x j - c+1 f{m-r,Vj-a-l,j-a-l)+(l-x j - c+1 )f{m-r,Vj-a,j-a) 

(iii) there is no 1 entry among the first c — 1 entries already fixed in row r. 
Furthermore, among the j — a rows of Ij which are not in the first r — 1 
rows of M(S), there are exactly b rows which have only zeroes in the 
first c — 1 positions. These are exactly the only rows of Ij which could 
appear in row r of M(S) given the choices made so far. If the r-th row of 
M(S) ends up being one of these rows — which happens with probability 
bx^ c (l — x) — then the probability of M(S) containing Vj rows from Ij 
is the same as the probability of having Vj — a—1 rows out of the j — a — 1 
many which are not in the first r rows of M(S) in a random generated 
matrix with m — r rows. Otherwise, the probability of having Vj rows 
of Ij in M(S) is the same as the probability of having, in a randomly 
generate matrix of size m — r, at least Vj — a rows out of the j — a which 
are not in the first r — 1 rows so far chosen for M(S). Therefore, 

X s [i,j] = x J ~ c (l-x)bf(m-r,Vj-a-l,j-a-l)+(l-bx J ~ c (l-x))f(m-r 7 Vj-a,j-a) 



C.l Estimating the time complexity of the derandomized strategy 



For each r = 1, . . . , m and c = 1, . . . , n, the computation of the entry M[r, c] 
requires that for at most all the Y?j=i (™) = 0(pn p ) sets of columns, S, we look 
up a constant number of values of /(-,-, ■). Recall that all such values have been 
precomputed in time 0(p 2 m). Moreover, we need to keep track, for each M(S), 
of the number af_ x of rows of Ij which already appear in the first r — 1 rows of 
M(S) and the b rows of Ij which coincide in the first c — 1 bits with the first 
c — 1 bits of row r of M(S). By indexing, this can be easily done in 0(p). In 
total, we spend 

O (nm x pn p x p + p 2 x m) = O (p 3 n p+1 log n) . 
This completes the proof of Theorem 4. 



D The Proof of Lemma 3 

D.l Some useful estimates 

We shall need the following technical facts. 
Lemma 4. Fix an integer p > 1 and let x = 



(a) For < e < 1, it holds that 



1 \ 1 ( e \ 1 2p 



lo §2 Ti 1 , ^ n -m vT ^ lo S2 < 



{l-{ep+l)xP- 1 (l-x))J ~\ b e-eJ l+pe' 
(b) Moreover (for e — 0) we have 



lo §2 7i ^TT, \T < 



1 \ ep 



(1 - xP-^l -x))J ~ log 2 e' 
Corollary 1. Fix an integer p > 1. Then, for x — 2=-, it feo/rfs i/iai 



+ i)^(i-*)) J SM1L 



D.2 (p, fe, n)-selectors exist of size O(plog^): yet another proof! 

Let m,n,p > 1 be integers and M be an m x n binary matrix. Recall that for 
each SC. {1, . . . n}, \S\ — p, we denote by M(S) the m x p submatrix of M 
consisting of all coloumns of M whose indices are in S. 

Fix integers m,n,p > 1 and generate a m x n binary matrix M by choosing 
each entry randomly and independently, with Pr(M[i,j] = 0) = (p — l)/p = x. 
For any integer k, 1 < k < p, and for any subset R of p — k + 1 rows of I p let 



Er,s be the event that matrix M(S) does not contain any of the (p — k + 1) 
rows of R. We have 



Pr(E R , s ) = (1 - (p - fc + l^-^l - (12) 

Let R\,. . .,R t , t = (p_^ +1 ) be all possible subsets of exactly p — fc + 1 rows of 
matrix 7 p , and let Es be the event that the sub-matrix M{S) does not contain 
any rows of some subset Ri. By the union bound we have 

Pr(E s ) = Pr(E RltS V ...VE RttS ) (13) 
" (p-k + l) ( 1 -(P- k + 1 > p - 1 ( 1 - x )) m = 1 (14) 

Let us denote by iVs the event that the sub-matrix M(5) does not contain at 
least k rows of I p . One can see that Pr(Ns) = Pr(Es). To see this, it is enough 
to observe that if M(S) does not contain at least k rows of I p it means that there 
is some Ri such that M(S) does not contain any of the rows in Ri. Consequently 

Pr(N s ) < q (15) 

There are (™) events N$, one for each SC {1, . . . , n} of cardinality p. 

Let Ym denote the event that the matrix M is a (p, k, n)-selector. If M is 
(p, k, n)-selector it means that there exists no set S such that the event Ns 
happens. We can use again the union bound to estimate the probability of the 
negated event Ym, as 



Pr(Y M ) = Pr 



whence, we obtain: 




Pr(Y M ) > 1 - Q ^ _ I + i J (1 - (p - k + 1) x^ 1 (1 - ,))" . (16) 
Let 

m* = argmin^-L Pr(Y M ) > 0. 

One can conclude that there exists a (p, k, n)-selector of size m*. 
We have to show that m* — 0(plogn/p). 

We can use (^) < { 2 f ) b to bound the two binomial coefficients. We have 

x p-fc+l 

pe 



Pr(Y M )>l-(j) {j^j) (l-(p-k + 1)^(1 -*))•". 



The last quantity is positive for any m such that 

ne\ v f pe 
p J \p - k + 1 



\ P / \ p—k+l 

i-(^) (— ?— ) (i-(p-fc + i)^- 1 (i-,)) m >o, 



which means 

(1- (p- k+^xP-^l-x))" 1 < 

i.e., 

p-fe+i 

ne V I pe 




log 2 nfJ ( p _T+iJ ) p log ™ + ( 2 p-fc + l)lo ge +(f>-fc + l)log 2 -f TI 

m > - 



l0 §2 {l-(p-k+V,xP-i{l-x)) l0 §2 (l-(p_fc+l) a: p-i(l- a; )) 

By Lemma 4 we have that for any fixed < a < 1 and k — ap, we can bound 

m* as 

log 2 e-log 2 (e-l + a) 

where A p ,k is a constant only depending on p and fc. I.e., we can find a (jp, k, n)- 
selector of size 0(plog ^). More precisely and in the spirits of the lower bounds 
of [7], the estimates in Lemma 4 show that the size m* of the (p, k, n)-selector 
whose existence is guaranteed by the probabilistic method, is bounded by 

log 2(1 + 0(1)). 

p — k + 1 p 

Notice that for a = 1, using Lemma 4 (b) (with e = 1 — a), we get the well 
known quadratic bound on the size of superimposed codes. 

In some applications, as in the case of the {d, d)-list disjunct matrices, of 
particular interest is the case a=\- For such case, using Corollary 1 we have 

m* < 3.411 (plog- + A p . k 
\ P 



which proves the desired result on the existence of a (p/2,p, n)-selector of size 
3.411plog2(l + o(l)). 



